Large Language Model Node
The Large Language Model (LLM) Node generates AI-powered text responses using configurable language models from providers such as OpenAI, Anthropic, Google, Ollama, and VLLM. It supports system prompts for behavior configuration, conversation history for context-aware dialogues, and temperature settings for response creativity. The node works with both cloud-based commercial providers and self-hosted model deployments.
How It Works
The Large Language Model node connects to AI models through two deployment approaches: VIDIZMO's pre-configured cloud providers or models running on your own infrastructure. Pre-configured providers connect to commercial services (OpenAI, Anthropic, Google) through centralized credentials stored in the AI service configuration, eliminating API key management in individual workflows. Self-hosted models connect directly to AI servers on your infrastructure (Ollama, VLLM), providing full control over model deployment and data privacy while eliminating external API dependencies.
During execution, the node constructs a request containing your prompts and sends it to the selected model. For conversation-aware workflows, the node can include previous messages from conversation history, allowing the model to understand context from earlier exchanges. The model processes the request, generates a text response based on your configuration, and returns it to the workflow. The response is stored in the specified output variable, making it available for downstream nodes.
The node supports single-turn interactions where each request is independent, and multi-turn conversations where the model maintains context across exchanges. Conversation history is managed automatically when enabled, with each interaction adding to the accumulated context. This enables natural dialogue flows where the model can reference earlier parts of the conversation, though token usage increases over time as history grows.
Configuration Parameters
Output Field
Output Field (Text, Required): Workflow variable where the AI-generated response is stored.
The output is the AI-generated text response as a string. The response format depends on the Output Format parameter (Text or JSON).
Text Format Output:
The document discusses three main topics: data security, compliance requirements, and implementation timelines. Each section provides detailed guidelines for enterprise deployment.
JSON format output:
{
"summary": "Overview of enterprise deployment guidelines",
"key_points": [
"Data security protocols",
"Compliance requirements",
"Implementation timelines"
],
"sentiment": "neutral"
}
Common naming patterns: llm_response, generated_text, ai_answer, summary, analysis
System Prompt
System Prompt (Text, Optional): Instructions that set the AI's behavior, persona, or constraints.
System prompts define how the model responds across all interactions, establishing consistent behavior patterns. Variable interpolation with ${variable_name} is supported to insert dynamic context.
Examples:
You are a helpful assistant that provides concise answers.You are a technical writer. Explain concepts clearly with examples.Analyze the following data and provide insights: ${context}
User Prompt
User Prompt (Text, Required): The main query or input for the LLM.
This parameter contains the actual question or task for the model to process. Variable interpolation with ${variable_name} is supported to dynamically insert data from previous workflow nodes.
Examples:
Summarize the following text: ${input_text}What are the key points in this document?Generate 5 product descriptions for: ${product_name}
Output Format
Output Format (Dropdown, Default: Text): Format of the LLM response.
| Format | Output | Use for |
|---|---|---|
| Text (default) | Plain text response | Standard text generation, summaries, answers |
| JSON | Structured JSON response | Structured data extraction, API responses, data processing |
Include History
Include History (Toggle, Default: ON): Controls whether previous messages in the conversation are included for context.
When enabled, the LLM receives the full conversation history, enabling context-aware multi-turn dialogues where the model can reference earlier exchanges. When disabled, only the current message is processed without historical context, treating each request as independent.
Persist Message
Persist Message (Toggle, Default: ON): Controls whether the current interaction is saved to conversation history.
When enabled, the message is stored for future requests, building up context over time. When disabled, the message is processed but not stored in history, which is useful for one-off queries that should not affect conversation context.
Use Self-Hosted Model
Use Self-Hosted Model (Toggle, Default: OFF): Controls whether to use pre-configured models or provide custom model configuration.
| Mode | Configuration | Use for |
|---|---|---|
| OFF (default) | Models configured in AI service settings with centralized credentials | Commercial providers (OpenAI, Anthropic, Google) with VIDIZMO-managed authentication |
| ON | Requires Base URL, API Key, and other configuration in this node | Self-hosted models (Ollama, VLLM) or custom endpoints with direct authentication |
Model Provider
Model Provider (Dropdown, Default: Ollama): The AI provider that hosts the model.
| Provider | Example models | Authentication |
|---|---|---|
| OpenAI | GPT-4, GPT-3.5-turbo, GPT-4-turbo | API Key |
| Anthropic | Claude 3.5 Sonnet, Claude 3 Opus, Claude 3 Haiku | API Key |
| Gemini Pro, Gemini 1.5 Pro | API Key | |
| Ollama | Llama 3, Mistral, Phi 3, custom models | Base URL |
| VLLM | Custom models | Base URL |
Model Name
Model Name (Text, Required): The specific model identifier to use.
Different models have different capabilities, context windows, and pricing. Variable interpolation with ${variable_name} is supported for dynamic model selection based on workflow logic.
Examples: gpt-4, claude-3.5-sonnet, llama3, gemini-pro, mistral
Base URL
Base URL (Text, Conditional): The base URL endpoint for the model provider API.
Required when Model Provider is Ollama or VLLM. This is the base URL where the self-hosted model server is running. Variable interpolation with ${variable_name} is supported for dynamic endpoint selection.
Examples: http://localhost:11434, http://192.168.1.100:8000
API Key
API Key (Text, Conditional): The API key for authenticating with commercial model providers.
Required when Model Provider is OpenAI, Anthropic, or Google. The key authenticates requests and tracks usage for billing. Variable interpolation with ${variable_name} is supported to load keys from secure workflow variables.
Variables such as ${api_key} from secure sources are recommended instead of hardcoding keys in the workflow.
Temperature
Temperature (Number, Default: 0.7): Controls the randomness of model responses (0–2).
Lower values produce more deterministic and focused outputs; higher values produce more creative and varied outputs. Variable interpolation with ${variable_name} is supported for dynamic temperature adjustment.
| Range | Behavior | Use for |
|---|---|---|
| 0–0.3 | Deterministic, factual, consistent | Summarization, data extraction, factual Q&A, classification |
| 0.4–0.7 (default) | Balanced creativity and consistency | General-purpose tasks, conversational AI |
| 0.8–2.0 | Creative, varied, exploratory | Content generation, brainstorming, creative writing, ideation |
Max Token Limit
Max Token Limit (Number, Default: 16000): Maximum number of tokens the model can process in a single request.
This includes both input (prompts and conversation history) and output (generated response) tokens. Variable interpolation with ${variable_name} is supported.
Common model limits:
- GPT-4: 8,192 tokens
- GPT-4-turbo: 128,000 tokens
- Claude 3: 200,000 tokens
- Llama 3.2: 128,000 tokens
- Gemini 1.5 Pro: 1,000,000 tokens
Lower token limits reduce costs and latency. Higher limits support longer conversations and documents.
Reasoning
Reasoning (Toggle, Default: OFF): Controls reasoning/thinking mode for supported models.
When enabled, the model uses extended reasoning processes before generating responses, producing more thoughtful and analytical outputs. When disabled, the model generates responses using standard processing. Variable interpolation with ${variable_name} is supported for dynamic control. This feature is available for models that support reasoning mode, such as Ollama thinking models.
Common Parameters
This node supports common parameters shared across workflow nodes, including Stream Output Response, Logging Mode, and Wait For All Edges. For detailed information, see Common Parameters.
Best Practices
- Use system prompts to establish consistent AI behavior across multiple requests in a workflow
- Variable interpolation creates dynamic prompts that adapt based on upstream node outputs
- Use lower temperature values (0–0.3) for factual tasks and higher values for creative tasks
- Set token limits to match model capabilities to avoid truncation
- Enable conversation history for multi-turn dialogues where context matters, but monitor token usage as history accumulates
- Self-hosted models (Ollama, VLLM) provide data privacy and can reduce API costs
- Store API keys in secure workflow variables rather than hardcoding them
Limitations
- Model Availability: Pre-configured models require VIDIZMO AI service configuration. Self-hosted models require running instances.
- Token Limits: Each model has maximum token limits. Exceeding limits results in truncated responses or errors.
- API Rate Limits: Commercial providers (OpenAI, Anthropic, Google) enforce rate limits based on subscription tier.
- Conversation History: History accumulates tokens over time. Long conversations may exceed model context windows.